Input Files¶
A SSAGES input file contains multiple sections that define the Collective Variables (CVs), Methods, and various other components that go into an advanced sampling simulation. There is a brief primer below on JSON, the format used by SSAGES for input files. The remaining topics describe the basic syntax and requirements for each section of an input file. Detailed information for particular methods or collective variables can be found in their respective locations in this manual.
JSON¶
SSAGES is run using input files written in the JSON file format. JSON is a lightweight text format which is easy for humans to read and write and for machines to parse and generate. Almost every programming language offers some level of native JSON support, making it particularly convenient to script or automate input file generation using, say, Python. If you’ve never used JSON before, don’t worry. Throughout the documentation we make no assumptions about the user’s knowledge of JSON and provide clear easy-to-follow examples.
A SSAGES input file is a valid JSON document. Here, we will define a bit of terminology relating to JSON. Take the following JSON structure as an example, obtained from Wikipedia:
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
],
"gender": {
"type": "male"
}
}
The first pair of curly brackets define the root section, we will signify
this using #
. An item in the hierarchy, such as the street address, can be
referenced like this: #/address/streetAddress
.
Square brackets []
in JSON refer to arrays, while curly brackets refer
to objects. They can be thought of as Python lists and dictionaries
respectively. That would make #/phoneNumber
an array of phone number
objects, each containing a type and a number. The fax number can be referenced
by #/phoneNumber/1/number
, where 1
is the array index beginning from
zero.
Items in a JSON object (Python dictionary) are unique. In the example above,
#/age
can only be defined once - it is a key in the root tree. Defining
#/age
again will not throw an error, but instead the last definition will
override any previous definitions. This is actually very powerful behavior.
It means a user can import a general template JSON file and override whatever
parameters they wish. The exact behavior of the merging process is described in
detail in the user guide.
Types matter in JSON. Notice how #/age
is specified by a number that is not
surrounded in quotes. This is a number, more specifically an integer. On the
other hand, #/address/postalCode
is a string, even though the contents of
the string are all numbers. Certain fields in a SSAGES input file may be
required to be a string, integer, or number. The user should be aware of this
and take care to format their input file appropriately.
Simulation Properties¶
A SSAGES build is compiled with support for a particular MD engine, and the requirements for each engine vary slightly. For detailed information on specific engines and their options check the Engines section. The following parameters are needed to define a simulation in the JSON root.
Warning
The properties specified below are case-sensitive. Please be sure to check that you have defined it according to the documentation.
Input¶
The "input"
property specifies the name of the input file used by the
simulation engine.
"input": "in.system"
"input": ["in.system1","in.system2","in.system3"]
The first syntax is used if there is a single input file. For multi-walker simulations, it is possible to use a single file for all walkers (though this may not be recommended depending on the method) or specify a separate input file for each walker.
Note
This property not used by GROMACS (see "args"
property).
Args¶
Warning
This property is exclusively for GROMACS and HOOMD-blue.
The "args"
property specifies additional command line arguments to be
passed to the engine.
"args": ["-v", "-deffnm", "runfile"]
"args": "-v -deffnm runfile"
For GROMACS, a standard simulation can be invoked using
gmx mdrun -deffnm runfile
to execute a runfile.tpr
binary, the
equivalent arguments must be specified in the "args"
property. This
provides the user with the flexibility of calling command-line arguments in the
same fashion as the standard mdrun utility. The only exception is in the
case of multi-walker simulations. If a user wishes to use the multi-walker
capabilities, then "args"
is invoked in the same fashion as a single-walker
simulation. Do not specify the -multi
option. This will be done
automatically. If -deffnm
is called, GROMACS expects the .tpr
files
for each walker to be named according to the walker ID starting from zero. In
the example above, if there were three walkers, then GROMACS will look for the
files “runfile0.tpr”, “runfile1.tpr”, and “runfile2.tpr”.
Walkers¶
The "walkers"
property specifies the number of walkers (independent instances
of the simulation engine) to run with SSAGES.
"walkers": 5
Many advanced sampling methods support multi-walker simulations which improve
the convergence of many algorithms. Typically, each walker has an independent
system configuration in a separate input file. It is important to note that
when specifying more than a single walker, the number of processors passed to
mpiexec
must be divisible by the number of walkers requested. Otherwise,
SSAGES will terminate with an error.
Note
It is not possible to allocate a different number of processors to each walker, at this time.
Collective Variables¶
The "CVs"
property specifies the collective variables on which SSAGES
will perform its advanced sampling.
"CVs":
[
{
"type": "Torsional",
"name": "mytorsion_1",
"atom_ids": [5,7,9,15]
},
{
"type": "ParticleCoordinate",
"atom_ids": [1],
"dimension": "x"
}
]
Collective variables are specified in an array, where each element is a CV object. Collective variables can be assigned names or referenced by index, beginning with zero.
Methods¶
The "methods"
property specifies the advanced sampling algorithms to which
SSAGES will apply to the system.
"methods":
[
{
"type": "Umbrella",
"ksprings": [100],
"output_file": "ulog.dat",
"output_frequency": 10,
"centers": [1.0],
"cvs": ["mytorsion_1"]
},
{
"type": "Metadynamics",
"widths": [0.3],
"height": 1.0,
"hill_frequency": 500,
"lower_bounds": [0.2],
"upper_bounds": [1.4],
"lower_bound_restraints": [100],
"upper_bound_restraints": [100],
"cvs": [1]
}
]
Methods are specified in an array, since it is possible to run multiple methods simultaneously. This is useful if a user is interested in performing advanced sampling on a system subject to some restraint, typically applied via an umbrella. Each method can selectively operate on a subset of CVs by referencing them either by name or index, as shown above.
Logger¶
The "logger"
property specifies an output file to track any or all CVs
as the simulation proceeds.
"logger": {
"frequency": 100,
"output_file": "cvs.dat",
"cvs": [0, 3]
}
If your simulation is using multiple walkers, you must define an array of
"output_file"
that has the same number of filenames as number of walkers.
For instance, with two walkers, use the syntax below.
"logger": {
"frequency": 100,
"output_file": ["cvs_w0.dat","cvs_w1.dat"],
"cvs": [0, 3]
}
The logger is useful in tracking the evolution of the CVs over the course of an advanced sampling calculation. Logging CVs can allow for post-simulation reweighting, or indicate if there are sampling problems in the system being studied. The frequency of logging the CVs can be specified and each walker in a multi-walker simulation will have a separate output file. A user can choose to selectively log individual CVs as well.
Putting It All Together¶
Combining the previous sections into a single input file yields the following (purely hypothetical) example input for a LAMMPS simulation.
{
"walkers": 2,
"input": ["in.first", "in.second"],
"CVs":
[
{
"type": "Torsional",
"name": "mytorsion_1",
"atom_ids": [5,7,9,15]
},
{
"type": "ParticleCoordinate",
"atom_ids": [1],
"dimension": "x"
}
],
"methods":
[
{
"type": "Umbrella",
"ksprings": [100],
"output_file": ["walker1.dat", "walker2.dat"],
"output_frequency": 10,
"centers": [1.0],
"cvs": ["mytorsion_1"]
},
{
"type": "Metadynamics",
"widths": [0.3],
"height": 1.0,
"hill_frequency": 500,
"lower_bounds": [0.2],
"upper_bounds": [1.4],
"lower_bound_restraints": [100],
"upper_bound_restraints": [100],
"cvs": [1]
}
]
}
To execute this input file, assigning two processors per walker, one would call the command below.
mpirun -np 4 ./ssages inputfile.json